Clinical characteristics and cancer-specific survival analysis of double primary cancer patients with lung cancer as the first primary cancer

The objective of this study is to explore the prognostic factors of double primary cancer patients with lung cancer as the first primary cancer (FPC). The Surveillance, Epidemiology, and End Results (SEER) database is a database established by the National Institutes of Research for cancer registration purposes, which collects relatively complete demographic characteristics and clinical data for assessing the epidemiological characteristics of cancer worldwide. Clinical data on patients with a clear histopathological diagnosis of double primary with lung cancer as the FPC were identified and collected from the SEER database from 2010 to 2015. Survival curves were plotted by Kaplan–Meier survival analysis. Independent prognostic factors of patients were analyzed by COX proportional risk model. Clinical data were collected from a total of 9306 patients, including 6516 patients in the modeling group and 2790 patients in the validation group. When we retrieved that the FPC was lung cancer, we found that the most common site of the second primary cancer was located in the respiratory system (54.0%). In addition, the most common site of first primary lung cancer in patients with double primary cancer was the right upper lobe (33.3%). A total of 14 independent prognostic factors were included, and the constructed survival nomogram had high accuracy and clinical applicability. The nomogram established in this study can help to raise awareness of clinical workers and the importance of such diseases, and guide the treatment and follow-up strategies.


Introduction
Lung cancer is one of the most common malignant tumors in the world, with the highest incidence and mortality rates. [1] With the decline in the number of people who smoke, the development and application of early lung cancer screening programs and the development of new targeted drugs, the 5-year survival rate of lung cancer is now higher than before, nonetheless the prognosis remains incredibly poor. [1,2] One of the major reasons for the poor prognosis of lung cancer patients is the existence of posttreatment long-term complications. [3] Currently, there is a relative lack of information related to long-term survival complications and survival prognosis of lung cancer patients in China.
Multiple primary cancers, as a category of long-term survival complications of cancer, are mainly manifested by the simultaneous or sequential appearance of 2 or more unrelated primary malignancies in the same patient. [3] With the innovation of cancer-related detection technology, the incidence of multiple primary cancers is increasing. However, when encountering such patients, clinical workers often misdirect them as metastatic lung cancer and lose confidence in their prognosis, and mostly focus on palliative and supportive treatment. Currently, the diagnosis and treatment of multicellular cancers remain a major challenge.
The Surveillance, Epidemiology, and End Result (SEER) database collects relatively complete demographic characteristics and clinical data for assessing the epidemiological characteristics of cancer worldwide. [4] The American Joint Committee on Cancer tumor node metastasis (TNM) staging currently used has limitations in predicting the prognosis of multiple primary cancers, and the nomogram is being widely used as a tool for Medicine individualized cancer prognosis prediction. The use of nomogram to quantify and predict relevant factors can better reflect the impact of cancer and treatment-related information on the survival of patient organisms. [5] Therefore, using the SEER database to construct a corresponding nomogram model to clarify the long-term complications and survival prognosis at the population level of lung cancer patients is the focus of this study, in order to provide some reference for future clinical issues regarding tertiary prevention and survival prediction of multiple primary cancers.

Case collections
In this study, data on patients diagnosed with lung cancer from 2010 to 2015 were retrieved from the SEER-18 database (covering approximately 28% of the US population) using SEER* Stat software (version 8.4.0.1). Extracted information mainly included demographic characteristics, histology type, treatment modality, and other relevant information. Specific indicators included age at diagnosis, race, gender, marital status, and histological type, site of lesion, degree of differentiation, TNM stage, treatment (surgery, lymph node dissection, radiotherapy, chemotherapy), and survival time for first primary cancer (FPC) and second primary cancer (SPC). Inclusion criteria: the FPC was lung cancer, double primary cancer, meeting the primary tumor criteria, and age at diagnosis of first primary lung cancer (FPLC) older than or equal to 18 years.
Exclusion criteria: patients with missing or incomplete clinical data, such as race, marital status, survival time of 0 or failure to follow-up; and 3 or more primary cancers.
The final study cohort was randomized 7:3 into the modeling and validation groups.

Statistical analysis
This study was statistically analyzed using SPSS software (version 26. 0, IBM, USA) and R language (version 4.1.0). The hazard ratio and 95% confidence interval were derived from univariate and multivariate COX risk model analysis to identify independent prognostic factors. The accuracy of the nomogram was evaluated using the area under the curve (AUC) of the receiver operating characteristic. The value of clinical application was evaluated by the calibration curve. Survival curves were plotted by the Kaplan-Meier (K-M) method for survival analysis. It was considered statistically significant at a P value of < .05.

Patient clinical characteristics
We included a total of 9306 patients: 6516 patients in the modeling group and 2790 patients in the validation group. We mapped the demographic characteristics and basic clinical features of the patients, as shown in Table 1. When we retrieved that FPC was lung cancer, we found that the most common site of SPC was located in the respiratory system, followed by the genitourinary and digestive systems, with partial involvement of various organs in the head and neck. Among them, the respiratory system was as high as 54.0%. In addition, we summarized the sites of the first primary lung cancer in patients with double primary cancer accordingly and found that the most common site was the right upper lobe (33.3%), followed by the left upper lobe, right lower lobe, left lower lobe, and right middle lobe.

Univariate and multivariate prognostic analysis
In this study, we identified the following clinical characteristics as prognostic factors for SPC: age, sex, race, and marital status, as well as histological type, lesion site, degree of differentiation, TNM stage, and treatment modality (surgery, lymph node dissection, radiotherapy, and chemotherapy) for both FPC and SPC, as shown in Table 2. In addition, multivariate COX analysis identified 14 independent prognostic factors associated with secondary SPC in patients with lung cancer.

Construction and validation of prediction models
Based on 14 independent risk factors revealed by multivariate COX analysis, we developed a nomogram model for predicting survival outcomes in patients with first diagnosed lung cancer recurrent SPC (Fig. 1). The results showed that the tumor stage of first diagnosed lung cancer contributed the most to the prognosis. The area under the receiver operating characteristic curve for the modeling group AUC = 0.86, 0.825, and 0.807 (1-, 3-, and 5-year survival, respectively), indicated that the model had high predictive accuracy ( Fig. 2A). Meanwhile, the corresponding calibration curve was plotted according to the predicted and actual patient survival. Where, the vertical and horizontal coordinates represent the actual and predicted survival probabilities of the model, respectively. The results showed that the calibration curves of the modeling group had good clinical applicability (Fig. 3A). This research team utilized the validation group for the validation of the model. The results showed that, the AUC of the validation cohort model also exceeded 0.75, and the calibration curve also showed a good linear relationship (Figs. 2B and 3B).

Survival analysis
We scored the independent prognostic factors included in the multifactorial Cox regression analysis accordingly, with a final score above the median being judged as high risk and vice versa. The final risk score results were derived and survival curves were plotted using the K-M method to compare cancer-specific survival between different risk subgroups (Fig. 4).

Discussion
As a major lung cancer country, China had about 820,000 new lung cancer cases and 710,000 lung cancer deaths in 2020. [1] In recent years, with the change in the patients' lifestyle, improvement of compliance and development of imaging technology, the detection rate of patients with double primary cancer has gradually increased, which has attracted the attention of clinicians. However, a large number of previous studies have mostly focused on single primary lung cancer and multiple primary lung cancer. [6][7][8][9] There are fewer studies on SPC about lung cancer combined with SPC, and it is easily confused with metastatic lung cancer, leading to loss of patient confidence in the treatment and poor prognosis. Therefore, we retrospectively collected a total of 9306 SPC patients whose first cancer was lung cancer from 2010 to 2015 in the SEER database for clinical analysis, and aimed to raise clinicians' awareness by developing predictive models related to the prognosis of such diseases. We hope that the results of our study will help clinical practitioners to focus on the identification of multiple primary cancers and prolong the survival time of patients. In this study, a survival nomogram model for predicting prognosis associated with this type of disease was developed by assessing prognostic factors associated with patients with double primary carcinoma of the first diagnosis of lung cancer. Identifying the common sites of secondary SPC after the occurrence of lung cancer is clinically important to improve the effectiveness and aggressiveness of follow-up of oncology patients. We ultimately included 9306 patients and identified corresponding independent prognostic factors including age, sex, race, and marital status, as well as histological type, lesion site, degree of differentiation, TNM stage, and treatment modality (surgery, lymph node dissection, radiotherapy, and chemotherapy). When we retrieved that the FPC was lung cancer, we found that the site of occurrence was mostly in the right upper lobe (33.3%) and the most common site of SPC was in the respiratory system (54.0%). In addition, understanding the treatment options for patients with dual primary cancers helps clinicians to individualize their Survival nomogram for prognosis of double primary carcinoma with lung cancer as the first primary cancer. According to a multivariate Cox regression analysis, we ascertained the independent prognostic factors (age, sex, race, marital, histology a , grade a , stage a , site b , grade b , stage b , surgery a/b , LN a/b , radiotherapy a/b , and chemotherapy a/b ) predicting survival outcomes in patients with first diagnosed lung cancer recurrent second primary cancer. The corresponding 1-, 3-, and 5-yr survival rates for a particular patient with this disease are obtained by summing the scores of the corresponding variables for the individual patient, finding the corresponding total score on the total point axis, and drawing a line downward. a = first primary cancer, b = second primary cancer. D = digestive system, G = genitourinary system, HN = head and neck, LN = Lymph node dissection, LUAD = lung adenocarcinoma, LUSC = lung squamous carcinoma, N = no, R = respiratory system, SCLC = small cell lung cancer, Y = yes. treatment and follow-up plans. In the study of risk factors for multiple primary cancers, it was found that radiotherapy and chemotherapy for the first cancer may contribute to the development of SPC. [3] This is consistent with our study, which showed that surgery and lymph node dissection of the primary site was a relatively protective factor for patients, while radiotherapy and chemotherapy to the primary site was a relative risk factor.
Of course, there are still some limitations to this study. First, this study is based on the SEER database, which cannot analyze certain common risk factors, such as smoking, alcohol consumption status, genetic information. [10] Second, this study did not provide in-depth analysis of the patients' treatment modalities, such as surgical modality, radiotherapy modality (e.g., stereotactic radiation therapy, tumor ablation, etc), chemotherapy modality, drug selection, etc, which still need further study. In addition, the SEER database is an open database established by the National Institutes of Research, and the group studied is predominantly White, lacking our domestic data. Regarding our research in this area, further multicenter and prospective studies by researchers and clinical workers in various fields may be needed to improve our prognostic risk prediction model. Notably, this study also lacks the analysis of factors related to the development of such disease and a novel nomogram model for predicting SPC, which needs to be gradually improved in our next work and provides some reference for the development of a better diagnosis and follow-up strategy.
In conclusion, this study constructs a prognostic model for patients with second primary carcinoma secondary to first diagnosed lung cancer with high accuracy and clinical applicability. With the number of patients with multiple primary cancers detected increasing year-by-year, clinical workers should pay more attention to such diseases. Early detection and diagnosis as well as early treatment will greatly improve the survival time and survival quality of such patients.